Sampling

Overview

  • You will hopefully already be familiar with the concept of sampling, and why we do it.

  • In this course we will specifically look at examples used for environmental data.

  • Some of this will be revision, but there may also be some new methods.

Environmental Sampling

  • We use samples in environmental & ecological data in situations where it is not possible to measure the entire population.

  • In environmental settings, this could be because:

    • The population is too large.

    • Some or all of the population is difficult, expensive or even impossible to reach.

    • The samples may be destructive, i.e. taking the sample causes permanent damage to the object being measured.

  • We want to use the information we obtain on the sample in order to make inference on the population.

Definitions

Population

The population is the set of all possible objects that could be sampled.

Sampling Units

The sampling units are the members of the population, i.e. the objects that could be sampled.

Sample

The sample is a collection of sampling units, i.e. a subset of the population.

Designing an ecological/environmental study

When we design an environmental or an ecological study we should focud on these steps:

  1. Define the study objectives.
  2. Summarize the environmental context.
  3. Identify the target population.
  4. Select an appropriate sampling design.
  5. Implement and summarize.

Step 1: Define the study objectives

We need to define clear and simple objectives for our study.

  • These will typically be properties of our data that we would like to measure
    • Characteristics of a variable, e.g. mean, median, variance.
    • Temporal or spatial trends of a variable.
    • Frequency of events, e.g. number of pollution events, species abundance or occurrence.

Example:

What is the spatial or temporal variability of water quality across a River Surveillance Network (RSN)?

Step 2: Consider the context

  • We have to think about the context of the question we are asking.
  • This means understanding the nature of our data, which is essential to ensuring we have a representative sample.
  • For example, if we’re measuring a river, we need to know about the depth, width and current.
  • If we are sampling in a forest, we need to know about vegetation and wildlife.

Step 3: Identify the target population

  • The population is the set of all possible objects that could be sampled.
    • All the fish in a lake.
    • All oak trees over 5m tall in a particular part of a forest.
    • Every river within a particular water network.
  • sometimes the population is actually what we are trying to measure, e.g. “How many red squirrels live in the Cairngorms National Park?”

Example: Water quality

Target population: RSN 1:250k with over 1.4 million reaches (a discrete segment of a river with relatively uniform characteristics) .

Characterise environmental conditions of the target population such as Water Quality Indicators, i.e., we need to define our response variable:

  • Macroinvertebrates composition obtained from the RICT Model 44 network (1:50k scale and trimmed to match the RSN network). E.g., WHPT-ASPT (Walley Hawkes Paisley Trigg Average Score Per Taxon) is a biological metric used to evaluate the ecological health of rivers based on the presence and sensitivity of macroinvertebrate (e.g., insects, worms, snails) communities. 

  • Orthophosphate \([\text{PO}_4]^{3-}\) concentrations (mg/L)

Step 4: Select a sampling design

There are a number of sampling designs which are commonly used for environmental data:

  • Simple random sampling.
  • Stratified random sampling.
  • Systematic sampling.
  • Spatial sampling.

We will discuss some of these in more detail during the course.

Step 5: Implement and summarise

  • Data collection - what information is being collected and how?

    • Biological elements - Macrophytes, macroinvertebrates, diatoms.

    • River habitat survey - Physical habitat essential for fish, macrophytes and invertebrate to live.

    • Physico-chemical elements - Water quality elements like dissolved oxygen, orthrophosphate, nitrogen

    • Toxic chemicals - Harmful (potentially banned) chemicals

    • Invasive non-native species - Plants, animals, fungi, or organisms that have been introduced to a new area where they are not native.

    • Physical properties - Temperature, width, slope, altitude, etc.

Step 5: Implement and summarise

  • Implementation - Deploying the network and measuring the quantities of interest. Some practical challenges include:

    • River is dry

    • Route issues

    • Site is overgrown, fenced or with barbed wire

    • Steep or high banks

    • Land owner permission

    • Safety issues.

Step 5: Implement and summarise

  • Often statisticians will not actually carry out the sampling. We will rely on field experts in many cases.

  • Once we receive the data, it’s important to assess the data for censoring, outliers, missingness etc.

  • We can then fit an appropriate statistical model.

  • Finally, we should report our results in clear language, including uncertainty where appropriate.

Sampling Strategies

  • We are interested in population parameter(s) \(\theta\) .

  • Typically, the value of \(\theta\) is unknown and it is unfeasible to measure all \(N\) elements of the population.

Sampling Strategies

  • We are interested in population parameter(s) \(\theta\) .

  • Typically, the value of \(\theta\) is unknown and it is unfeasible to measure all \(N\) elements of the population.

  • We select a representative sample and measure \(n < N\) units to estimate it.

  • The question now, is how do we select such units?

Sampling Strategies

  • We are interested in population parameter(s) \(\theta\) .

  • Typically, the value of \(\theta\) is unknown and it is unfeasible to measure all \(N\) elements of the population.

  • We select a representative sample and measure \(n < N\) units to estimate it.

  • The question now, is how do we select such units?

A sampling strategy integrates both sample selection methods from a target population and estimation techniques to infer population attributes from sample measurements.

Flash Re-cap: Estimator properties

Def. Consistency

An estimator \(\hat{\theta}\) of \(\theta\) is said to be consistent if for any \(\epsilon >0\)

\[ \lim_{n\to\infty} \mathbb{P}(|\hat{\theta}-\theta| > \epsilon) = 0 \]

Expected value

The expected value of an estimator is a weighted average of all possible estimates.

\[ \mathbb{E}(\hat{\theta})= \sum_{x\in\Omega} p(s)\hat{\theta}(s) \]

It is a function of both, the sampling design (due to inclusion probabilities \(p(s)\)) and the population being sampled (through the sample estimate \(\hat{\theta}(s)\))

Flash Re-cap: Estimator properties

Bias

The difference in magnitude between its expected value and the population parameter

\[ \textbf{Bias}(\hat{\theta}) = \mathbb{E}(\hat{\theta}) - \theta \]

Variance

Average squared distance between individual estimates \(\hat{\theta}(s)\) and their expected value \(\mathbb{E}(\hat{\theta})\)

\[ \text{Var}(\hat{\theta}) = \sum_{s\in \Omega} p(s) \left[\hat{\theta}(s)-\mathbb{E}(\hat{\theta})\right]^2 \]

Precision

The precision of an estimator is a qualitative measurement that assess how small or large the variability of an estimator is

Flash Re-cap: Estimator properties

Types of Sampling methods

Simple Random Sampling

  • As the name suggests, this is the simplest form of sampling.

  • Every object in our population has an equal probability of being included in the sample.

  • This requires us to have a complete list of the population members, or a sampling frame covering the entire region.

  • We then generate a set of n random digits which identify the individuals or objects to be included in a study

Mean and Variance

  • For a sample of size \(n\), denoted \(y_1, \ldots, y_n\), we can compute the sample mean as

\[\bar{y} = \frac{\sum_{i=1}^n y_i}{n}.\]

  • We can then compute the estimated population variance as

\[s^2 =\frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-1}.\] - As well as estimating the population mean and variance, we also have to think about the uncertainty surrounding these estimates.

  • This is what a confidence interval is typically representing.

Sampling Variability

  • Our sample of size \(n\) is just one of many possible samples of size \(n\) which we could have obtained from our population that has size \(N\)  (where \(n < N\)).
  • We must take this into account when considering the uncertainty associated with our sample mean. This is known as sampling variability.
  • We can compute this as: \[\text{Var}(\bar{y}) = \frac{s^2}{n}\left(1 - \frac{n}{N}\right).\]
  • Here, \((1 - \frac{n}{N})\) is what is known as a finite population correction (FPC). This accounts for the proportion of the data which remains unknown.

Example: Cobalt-60

  • Cobalt-60 is a synthetic radioactive isotope of cobalt produced in nuclear reactors.
  • We may be interested in estimating how much of this is in the sediment of a river estuary.
  • This map is colour coded by different sediment types. How might we make use of this information when sampling?

Stratified Sampling

  • Stratified sampling involves dividing the population into two or more groups (or strata) which have something in common.
  • Divide the dataset of size \(N\) into \(L\) non-overlapping strata such that within-strata variability is less than between-strata variability.
  • We then ensure that each of our strata are represented in our sample, and take this into account

Proportional allocation

We then ensure that that each of these strata are represented proportionally within our sample (known as proportional allocation.

  • Samples are still taken randomly within each stratum.

Mean and variance

  • Let \(N_1, \ldots, N_L\) be the populations of our \(L\) strata, and \(n_1, \ldots, n_L\) be the number of samples taken from each.

  • It is straightforward to obtain sample means \(y_1, \ldots, y_L\) and sample variances \(s_1^2, \ldots, s_L^2\) for each stratum.

  • Then we compute the overall sample mean as \[\bar{y} = \frac{\sum_{l=1}^L \left( N_l \ y_l \right)}{N}.\]

  • We can also compute the variance of the sample mean as

\[\text{Var}\bar{y} = \sum_{l=1}^L \left[ \left(\frac{N_l}{N}\right)^2 \frac{s_l^2}{n_l} \left(1 - \frac{n_l}{N_l} \right) \right].\]

Systematic Sampling

  • Systematic sampling is a sampling method which makes use of a natural ordering that exists in data.

  • We wish to take a sample of size \(n\) from a population of size \(N\), which means every \(k = \frac{N}{n}\) objects are sampled.

  • For systematic sampling, we select our first unit at random, then select every \(k\)th unit in a systematic way.

  • For example, if we have \(N=50\) and \(n=5\), then \(k=10\).

  • If our first unit is 2, our sample becomes units 2, 12, 22, 32, 42

Systematic Sampling

Advantages 😁 Disadvantages 😔
Convenient and quick. May not be representative.
Well spaced across the study. Systematic patterns in the data can be overlooked.
Sort of random — every object has an equal chance of selection. Extremely deterministic — estimation of variance particularly difficult.

Spatial Sampling

  • Spatial sampling is required when our data have an attribute that is spatially continuous.
  • For example, if we are measuring water quality in a lake, we may have a three-dimensional coordinate system of length, width and
  • In some cases, it is possible to measure at any one of these locations, simple random sampling or stratified sampling can be used.
  • There are many examples where it is not possible or convenient to do so, in which case some form of systematic sampling may be used.

Transects

  • Spatial sampling often uses a systematic sampling scheme based on transects.

  • A transect is a straight line along which samples are taken.

  • The starting point, geographical orientation and number of samples are chosen as part of the sampling scheme.

  • Samples will then be either taken at random points along the length of the line (continuous sampling) or systematically placed points (systematic sampling).

Transects

  • Suppose we need to take samples of water quality on a lake.

  • Our sampling scheme may use multiple transects simultaneously.

Distance sampling

  • Is a popular method in ecology for estimating animal abundance.

  • Data are obtained by measuring the perpendicular distances from a transect line to detected individuals.

  • The probability of detection decreases with increasing distances via a parametric function.

Quadrats

  • In some cases, we will instead be interested in trying to understand the frequency of a certain species across space.
  • A quadrat is a tool used in ecology and other settings for this purpose.
  • A series of squares (quadrats) of a fixed size are placed in the habitat of interest, and the species within the quadrats are counted.
  • The number of quadrats, and their positions and orientations are chosen as part of the sampling scheme.

Quadrats

The quadrats shown below were used to study orangutan nests in a region of Borneo.

Grid Sampling

  • It may often be useful to use a regular grid to make sampling convenient and efficient.
  • The grid is overlaid on the spatial region, and a fixed number of samples (usually one) is taken from each grid square.
  • We choose the size of the grid such that the number of squares relates to the number of samples we require.
  • For example, for a region of size 5km \(\times\) 5km, choosing 1km \(\times\) 1km grid squares would give us 25 squares in total.

Types of Grid Sampling

  • Aligned Grid: we take a sample from the same (randomly selected) coordinates within each square.
  • Centrally Aligned Grid: we take a sample from the central coordinates of each square.
  • Unaligned Grid: each grid square has a sample taken from different randomly selected coordinates.
  • Triangular Grid: this is a modified version of the aligned grid where the points are fixed based on a triangular arrangement.

Types of Grid Sampling

Summary of Grids

  • The aligned and centrally aligned grids are convenient but may miss systematic patterns in the data.

  • The unaligned grid avoids this, and combines the advantages of simple random sampling and stratified sampling. However, it can be inefficient for collection.

  • The triangular grid can perform well in specific cases where the spatial correlation structures varies with direction.

Example: chlorophyll-a in Lake Balaton

  • Aim: Estimate the average level of chlorophyll-a in Lake Balaton, Hungary.

  • Levels are heavily affected by differences in the levels of nutrients along the length of the lake (known as a “trophic gradient”).

Example: chlorophyll-a in Lake Balaton

  • The population is all possible water samples from the lake, and our sampling units are individual water samples.
  • Stratified random sampling seems appropriate here due to heterogeneity in the lake.
  • We could design transects which cover most regions of the lake.
  • However, there may be difficulty accessing all areas by boat.
  • Also potential issues with the boat itself disrupting the levels.

Sample size

Sample size

  • A crucial part of sampling is identifying the appropriate sample size for our study.
  • If the sample is too small, it will not be sufficiently representative of the population.
  • If the sample is too big, it will be expensive and time consuming to collect, which may defeat the purpose of using sampling in the first place.

Power and/or precision

  • It is therefore important that we understand exactly what it is that we want from our sampling process.
  • We can think about it in terms of two key aspects, power and precision.
  • Precision: How accurately do I want (or need) to estimate the mean, median, variance etc?
  • Power: How small a difference is it important to detect, and with what degree of certainty?

Using confidence intervals

  • The general form of a 95% confidence interval for the population mean, \(\mu\) is

\[\bar{x} \pm t_{1-\alpha/2} \sqrt{\text{Var}(\bar{x})}.\]

  • The width of the interval is determined by the estimated standard error, \(\sqrt{\text{Var}(\bar{x})}\), and we know the formula for this contains \(n\).

  • Therefore, if we know how wide we need our interval to be (i.e. we know the required precision), we can calculate the \(n\) required to do that.

Choosing the sample size

  • Let our maximum required standard error be denoted as \(U\). Then we need to compute: \[ \begin{aligned} \sqrt{\text{Var}(\bar{x)}} \leq U \\ \frac{\sqrt{s^2}}{\sqrt{n}} \leq U \\ \sqrt{n} \geq \frac{\sqrt{s^2}}{U} \end{aligned} \]

  • Here, \(s^2\) is the sample variance.

  • Can anyone see a problem here?

Choosing the sample size {auto-animate =true}

  • This calculation requires us to use \(s^2\), the variance of our sample.
  • But we don’t have the sample yet. The whole point of this exercise is to determine the size of our sample.
  • So, what can we use instead?

Choosing the sample size {auto-animate =true}

  • This calculation requires us to use \(s^2\), the variance of our sample.
  • But we don’t have the sample yet. The whole point of this exercise is to determine the size of our sample.
  • So, what can we use instead?
  • Typically, we will use knowledge from prior studies where available, or will commission a small pilot study.

Example: PCB in salmon

  • Polychlorinated biphenyl (PCB) is a carcinogenic pollutant often found in fish.
  • We wish to estimate the mean PCB level in the salmon in a fish farm, and require a precision level (estimated standard error) of \(\pm 0.1 \mbox{mg/kg}^2\).
  • We know from previous studies that the variation of PCB in salmon flesh is \(3.19^2\).
  • How large a sample do we need?

Example: PCB in salmon

  • We can solve our prior equation as

\[n \geq \left(\frac{s}{u} \right)^2 = \left(\frac{3.19}{0.1} \right)^2 = 1018.\]

  • We have estimated we need a minimum sample size of 1018 to obtain the required precision.
  • In some cases this may be impractical, in which case we may have to settle for a lower precision.

Stratified sampling

For stratified sampling, this process is much more complicated.

  • As well as considering the sample size, we also have to think about how to allocate our samples across the strata.
  • A common approach is to specify a cost mode, where we take into account the different costs of sampling each stratum.
  • The aim is to minimise the estimated standard error for a given total cost.

Cost model

  • Let \(C\) be the overall cost, let \(c_0\) be the fixed overhead costs of the survey and let \(c_l\) be the cost per sample in stratum \(l\).
  • Then our cost model is \[C = c_0 + \sum_{l=1}^L c_l n_l.\]
  • Typically \(C\) is fixed, and the goal is to select the values of \(n_l\) which allow us to obtain the best possible estimate.

Stratum sample sizes

  • Now let \(\omega_l = N_l/N\) be the proportion of the overall population which is found within stratum \(l\).

  • Also let \(\sigma_l\) be the standard deviation for the population of stratum \(l\).

  • We can then compute the optimum number of samples in each stratum as

\[n_l = n \ \frac{\omega_l \sigma_l /\sqrt{c_l}}{\sum_{k=1}^L \omega_k \sigma_k /\sqrt{c_k}}\]

Notes on stratum sample sizes

  • If the costs are the same for all strata, the equation simplifies to what is known as the Neyman allocation

\[n_l = n \ \frac{\omega_l \sigma_l}{\sum_{k=1}^L \omega_k \sigma_k}\]

  • We often calculate \(n\) by a similar approach to that described for simple random sampling.
  • In practice, the population standard deviation \(\sigma_l\) is often replaced by the sample standard deviation \(s_l\).

Summary

  • We now know how to determine the correct sample size to use to obtain estimates with the desired precision.
  • This requires some advance knowledge, e.g. the tolerable level of error, the cost of the experiment, and the expected standard deviation.
  • However, bear in mind that we have not considered the possibility of missing data in our sample. Losing some data will impact precision.
  • We’ve also assumed that all data points are independent. This is not always true for time series or spatial data. We will address this later in the

Monitoring Networks

Monitoring Networks

A monitoring network is a set of stations placed across a region of interest to gather information about one or more environmental resource.

  • Aim to detect trends and unexpected changes.
  • Standard sampling is adequate in many cases.
  • However, the advantage of networks is that they can change over time.
  • New sites can be added, different variables measured, technology improved.

Example: Countryside Survey

  • The Countryside Survey is a census of the natural resources of the UK’s countryside.

  • The first full survey was in 1978, and it was taken again at 6–10 year intervals until 2019.

  • Since 2019, it has been funded as a “rolling” survey, measuring locations on 5-yearly cycles.

  • The goal is to map changes at various different scales, as well as to understand what is driving those changes.

Example: Countryside Survey

  • Stratified sampling of 1km grid squares from across the UK.
  • Strata chosen based on Institute of Terrestrial Ecology (ITE) land classification and broad habitat.
  • https://countrysidesurvey.org.uk/
  • How do we select the sites/resources to monitor?

Generalised Randomised Tesselation Stratified (GRTS)

Generalised Randomised Tesselation Stratified

Is a form of spatially balanced probability sampling scheme

  • Method for sampling an extensive environmental resource introduced by [@stevens2004]
  • Builds on random, systematic and stratified sampling procedures while ensuring spatial balance.
  • Used by USEPA and Marine Scotland to design monitoring networks

Generalised Randomised Tesselation Stratified (GRTS)

  1. Define the region/sampling frame of interest.
  2. Identify the resources/sites to sample.
  3. Determine the inclusion probability of each site/resource (e.g., trees, river breaches, lakes, SPAs etc.).

Example: Lakes monitoring

  • There are \(N= 16\) main lakes and a sample of \(n=4\) is desired.

  • Assuming equal sampling probabilities the inclusion probabilities for each lake re \(n/N = 4/16 = 0.25\)

Generalised Randomised Tesselation Stratified (GRTS)

  1. A square bounding box is superimposed onto the sampling frame and is divided into four equally sized square cells (level 1 cells).

Generalised Randomised Tesselation Stratified (GRTS)

  1. A square bounding box is superimposed onto the sampling frame and is divided into four equally sized square cells (level 1 cells).
  2. Each cell is then randomly labelled/numbered, e.g.,\(\mathcal{A}_1\equiv \{a_1:a_1 - 0,1,2,3\}\).

Generalised Randomised Tessellation Stratified (GRTS)

  1. Each square is split into four more tessellations which are again randomly numbered while retaining the first-level label.
  2. The set of level-two cells is denoted by \(\mathcal{A}_2 \equiv \{a_1a_2 : a_1 = 0,1,2,3; ~a_2 = 0,1,2,3\}\).

Generalised Randomised Tessellation Stratified (GRTS)

  1. Continue this hierarchical randomisation to the desired spatial scale such that \(\mathcal{A}_k \equiv \{a_1,\ldots,a_k:a_1 = 0,1,2,3; \ldots; a_k = 0,1,2,3\}\) until the sum of the inclusion probabilities of each element within a given square are less than one.

Generalised Randomised Tesselation Stratified (GRTS)

  1. Transform the level \(k\) grid cell to a one-dimensional number line by sorting the cells hierarchically (starting from the first-level label).
  • The length of each line-segment represents the inclusion probability for a given resources/site (inclusion probabilities should sum up to \(n\)).

the colored cells indicate site where small (blue) and large (red) lakes are present.

Generalised Randomised Tesselation Stratified (GRTS)

  1. Transform the level \(k\) grid cell to a one-dimensional number line by sorting the cells hierarchically (starting from the first-level label).

  2. Use systematic sampling along the line to select the resources to survey. E.g., draw \(u_1 \sim \text{U}(0,1)\) and select \(s_1\) as the first site to sample. The following next \(j = 2,\ldots,n\) sites are selected according to \(u_j = u_{j-1} +1\).

Generalised Randomised Tessellation Stratified (GRTS)

  1. Transform the level \(k\) grid cell to a one-dimensional number line by sorting the cells hierarchically (starting from the first-level label).

  2. Use systematic sampling along the line to select the resources to survey. E.g., draw \(u_1 \sim \text{U}(0,1)\) and select \(s_1\) as the first site to sample. The following next \(j = 2,\ldots,n\) sites are selected according to \(u_j = u_{j-1} +1\).

  1. The approach can be modified to allow for unequal inclusion probabilities.
  • E.g., Suppose we would like larger lakes to be twice as likely to be selected as small lakes.

  • Instead of given all lakes the same unit length we can give large lakes twice the unit length of small lakes.

Generalised Randomised Tesselation Stratified (GRTS)

In addition to unequal inclusion probabilities we can also perform stratified sampling.

  • Instead of sampling from the entire sampling frame simultaneously, we divide a sampling frame into distinct sets of sites and select samples from each stratum independently

  • The GRTS algorithm is applied to each strata to obtain stratum-specific samples.

  • The R-package spsurvey implements GRTS algorithm to select spatially balanced samples via the grts() function.

Assessing Environmental Change

  • The purpose of monitoring is to assess the changes in a particular variable over time.
  • This can typically be carried out using standard statistical techniques, taking into account the structure of the data.
  • Sometimes we are interested in whether a specific event has had an impact on the variable, e.g. the effect of new regulations on the air pollution level.
  • Typically this involves assessing the levels before and after the event.

Before-After Designs

  • It is generally very difficult to untangle the effects of a single event.

  • Even if we identify a change in the mean or variance, how do we know that it is due to our event?

  • Many environmental systems change naturally over time for any number of reasons.

  • We don’t have a statistical control. (We can’t turn back the clock and check what would have happened without the event.)

Variable of interest

  • However, this challenge is not unique to environmental data. (We face it regularly in statistics.)
  • Often, we are only interested in the effect of one particular variable, but we have to account for other nuisance variables via regression or other techniques.
  • We can also sometimes account for other unmeasured variability through random effects.
  • The key is to acknowledge what you do and don’t know, and to account properly for uncertainty.

Example: Before-After Single Site

  • Assessing impact of intervention with data for one site. The site of interest is monitored before and after the time of the intervention.

Statistical model

\[X_{ik} = \mu + \alpha_i + \tau_{k(i)} + \varepsilon_{ik}\]

  • \(\mu\): overall mean
  • \(\alpha_i\): effect of period (before/after)
  • \(\tau_{k(i)}\): time within period
  • \(\varepsilon_{ik}\): errors.

Example: Before-After Single Site

  • Assessing impact of intervention with data for multiple sites.

  • Select \(j = 1,\ldots,M\) sites in the impact area and sample before/after intervention.

Statistical model

\[X_{ijk} = \mu + \alpha_i + \tau_{jk(i)} + \delta_j + \varepsilon_{ijk}\]

  • \(\mu\): overall mean
  • \(\alpha_i\): effect of period (before/after)
  • \(\tau_{jk(i)}\): time within period
  • \(\delta_j\): site random effect
  • \(\varepsilon_{ijk}\): errors.

  • Treating the sites as subsamples allows the sites to be used to improve estimation of the effect’s magnitude.

Example: Before-After Control-Impact (BACI)

  • One or more potentially impacted sites, and one or more non-impacted sites, are sampled before and after the time of the intervention.

Statistical model

\[X_{ij} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ij}\]

  • \(\mu\): overall mean
  • \(\alpha_i\): effect of period (before/after)
  • \(\beta_j\): effect of location (control/impact)
  • \(\varepsilon_{ik}\): errors.

Summary points

Summary points

  • We use information we obtain from a sample to make inference on the population.

  • There are five steps to designing a sampling experiment:

  1. Define the study objectives.
  2. Summarise the environmental context.
  3. Identify the target population.
  4. Select an appropriate sampling design.
  5. Implement and summarise.
  • Types of sampling include:

  • Simple Random Sampling

  • Stratified Sampling

  • Systematic Sampling

  • Spatial Sampling (including Transects, Distance Sampling, Quadrats & Grid Sampling )

Summary points

  • Sample size depends on the intended power and precision.

  • E.g., if we want to estimate a mean value \(\bar{x}\), given a maximum required standard error \(U\) and sample variance \(s^2\), then sample size \(n\) must be such that:
    \[ \sqrt{n} \geq \frac{\sqrt{s^2}}{U}. \]

  • For stratified sampling, the optimum number of samples in each stratum \(l\) is calculated as:
    \[ n_l = n \ \frac{\omega_l \sigma_l /\sqrt{c_l}}{\sum_{k=1}^L \omega_k \sigma_k /\sqrt{c_k}} \]

  • A monitoring network is a set of stations placed in a region of interest to gather information about one or more environmental variables.

  • Sampling methods such as GRTS can be used to identify the sites/resources to be monitored.

References